-
Notifications
You must be signed in to change notification settings - Fork 2.6k
feat: enhance tree-sitter parsers for multiple languages #2420
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
- Enhanced the Tree-Sitter parser for JavaScript/TypeScript with support for advanced language constructs - Modified the parser to exclude comments from the output - Consolidated sample code in tests for better maintainability Signed-off-by: Eric Wheeler <[email protected]>
This enhancement significantly expands the C++ parser's capabilities to recognize and extract a wide range of modern C++ language constructs, improving code navigation and analysis. New supported language constructs include: - Union declarations and their members - Destructors and their implementations - Operator overloading (including stream operators) - Free-standing and namespace-scoped functions - Enum declarations (both traditional and scoped enum class) - Lambda expressions and their captures - Attributes and annotations - Method overrides with virtual/override specifiers - Exception specifications (noexcept) - Default parameters in function declarations - Variadic templates and parameter packs - Structured bindings (C++17) - Inline namespaces and nested namespace declarations - Template specializations and instantiations - Constructor implementations This enhancement provides more comprehensive code structure analysis for C++ codebases, particularly those using modern C++ features from C++11, C++14, and C++17 standards. Signed-off-by: Eric Wheeler <[email protected]>
This enhancement significantly expands the Go parser's capabilities to recognize and extract a comprehensive set of language constructs: - Added support for struct and interface definitions with proper type identification - Implemented parsing for constant declarations (both single and in blocks) - Added support for variable declarations (both single and in blocks) - Added recognition of type aliases with proper distinction from regular types - Implemented special handling for init functions - Added support for anonymous functions, including nested function literals - Improved documentation and organization of query patterns These enhancements enable more accurate code navigation, better symbol extraction, and improved code intelligence for Go codebases. Signed-off-by: Eric Wheeler <[email protected]>
This enhancement significantly expands the Java parser's capabilities to recognize and parse a wide range of Java language constructs: - Added support for enum declarations and enum constants - Added support for annotation type declarations and elements - Added support for field declarations - Added support for constructor declarations - Added support for lambda expressions - Added support for inner and anonymous classes - Added support for type parameters (generics) - Added support for package and import declarations These improvements enable more comprehensive code analysis for Java projects, providing better definition extraction and navigation capabilities. Signed-off-by: Eric Wheeler <[email protected]>
…ures This commit significantly enhances the Python tree-sitter parser to support a comprehensive range of Python language constructs, enabling more accurate and detailed code analysis. Key improvements: - Added support for method definitions (instance, class, and static methods) - Added support for decorators on functions and classes - Added support for module-level variables and constants - Added support for async functions and methods - Added support for property getters/setters - Added support for type annotations in various contexts - Added support for dataclasses - Added support for nested functions and classes - Added support for generator functions - Added support for list/dict/set comprehensions - Added support for lambda functions - Added support for abstract base classes and methods The parser now handles Python's rich feature set more comprehensively, including special Python patterns like decorators, type annotations, and various comprehension types. This enables better code navigation, understanding, and analysis for Python codebases. Signed-off-by: Eric Wheeler <[email protected]>
|
|
The pull request enhances tree-sitter parsers for multiple languages, including C++, Go, Java, Python, TypeScript, and TSX. While the changes are extensive, they are all related to the same feature enhancement across different languages. Therefore, it is not necessary to split this pull request into smaller ones, as the changes are cohesive and contribute to a single feature enhancement. |
|
this looks big but most of it is tests. these are the substantive changes: ]$ git diff --stat origin/main src/services/tree-sitter/queries/
src/services/tree-sitter/queries/cpp.ts | 85 +++++++++++-
src/services/tree-sitter/queries/go.ts | 51 +++++++
src/services/tree-sitter/queries/java.ts | 55 +++++++-
src/services/tree-sitter/queries/python.ts | 191 ++++++++++++++++++++++++++
src/services/tree-sitter/queries/tsx.ts | 41 +++++-
src/services/tree-sitter/queries/typescript.ts | 32 +++++
6 files changed, 448 insertions(+), 7 deletions(-) |
* feat(bedrock): adding two regions * changeset
Context
This PR enhances the tree-sitter parsers for multiple languages to improve code navigation and analysis capabilities, specifically targeting languages tested in the cte/evals repository.
Implementation
Enhanced tree-sitter parsers for:
Each language parser has been updated to support a comprehensive range of language constructs, significantly improving the ability to extract and navigate code definitions.
Benefits for Evaluation Testing
These enhancements will provide better code navigation and understanding when working with evaluation exercises from https://github.com/cte/evals, which contains exercises for all the enhanced languages.
Testing Recommendation
It would be beneficial to test an eval series with "File read auto-truncate threshold" set to zero. This configuration will provide a rich set of program definition line numbers, reducing context and providing additional focus without distraction from content that is no longer relevant.
Get in Touch
Discord: KJ7LNW
cc: @cte
Important
Enhances tree-sitter parsers for multiple languages, improving support for various language constructs and adding comprehensive tests and queries.
parseSourceCodeDefinitions.cpp.test.ts,parseSourceCodeDefinitions.go.test.ts,parseSourceCodeDefinitions.java.test.ts,parseSourceCodeDefinitions.python.test.ts, andparseSourceCodeDefinitions.tsx.test.ts.cpp.ts,go.ts,java.ts,python.ts,tsx.ts, andtypescript.ts.This description was created by
for c2dda05. It will automatically update as commits are pushed.